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Abstract —We propose an information-theoretic framework 
for matrix completion. The theory goes beyond the low-rank 
structure and applies to general matrices of “low description 
complexity”. Specifically, we consider random matrices X G 
R"“ xn of arbitrary distribution (continuous, discrete, discrete- 
continuous mixture, or even singular). With S C R mxrl a n 
e-support set of X, i.e., P[X G <S] > 1 — e, and dim B (<S) 
denoting the lower Minkowski dimension of S, we show that 
k > dim B (S) measurements of the form (AX), with A i 
denoting the measurement matrices, suffice to recover X with 
probability of error at most e. The result holds for Lebesgue 
a.a. A; and does not need incoherence between the A; and the 
unknown matrix X. We furthermore show that k > dim p (S) 
measurements also suffice to recover the unknown matrix X 
from measurements taken with rank-one A,, again this applies to 
a.a. rank-one A i. Rank-one measurement matrices are attractive 
as they require less storage space than general measurement 
matrices and can be applied faster. Particularizing our results to 
the recovery of low-rank matrices, we find that k > (m + n — r)r 
measurements are sufficient to recover matrices of rank at most 
r. Finally, we construct a class of rank-r matrices that can 
be recovered with arbitrarily small probability of error from 
k < (m An — r)r measurements. 

I. Introduction 

Matrix completion refers to the recovery of a low-rank 
matrix from a (small) subset of its entries or a (small) 
number of linear combinations of its entries. This problem 
arises in a wide range of applications, including quantum 
state tomography, face recognition, recommender systems, and 
sensor localization (see, e.g., IB and references therein). 

The formal problem statement is as follows. Suppose we 
have k linear measurements of the m x n matrix X with 
rank(X) < r in the form of 

y=((A 1 ,X),...,(A fe ,X)) T GR fc 

where G R mxrl denotes the measurement matrices and (•, •) 
stands for the standard trace inner product between matrices 
in R mxrl . The number of measurements k is typically much 
smaller than the total number of entries, mn, of X. Depending 
on the A,, the measurements can simply be individual entries 
of X or general linear combinations thereof. 

The vast literature on matrix completion, for a highly in¬ 
complete list see 0-0, provides guarantees for the recovery 
of the unknown low-rank matrix X from the measurements y, 
under various assumptions on the measurement matrices A, 
and the low-rank models generating X. For example, in 21 the 
A i are assumed to be chosen randomly from an orthonormal 
basis for R nxn and it is shown that an unknown nxn matrix 
X of rank at most r can be recovered with high probability if 
k = 0{nrv In 2 n). Here, v quantifies the incoherence between 


the unknown matrix X and the orthonormal basis for R n x n 
the A, are drawn from. 

The setting in 0 assumes random measurement matrices 
A i with the position of the only nonzero entry chosen uni¬ 
formly at random. It is shown that almost all (a.a.) matrices 
(with respect to the random orthogonal model 0 Def. 2.1]) 
of rank at most r can be recovered with high probability (with 
respect to the measurement matrices) provided that the number 
of measurements satisfies k > Cn 125 r\nn, where C is a 
numerical constant. 

In 0 it is shown that for measurement matrices A i con¬ 
taining i.i.d. entries (that are, e.g., Gaussian), a matrix X of 
rank at most r can be recovered with high probability from 
k > C(m + n)r measurements, where C is a constant. The 
recovery guarantees in 0-0 all pertain to recovery through 
nuclear norm minimization. In 0 measurement matrices A, 
containing i.i.d. entries drawn from an absolutely continuous 
(with respect to Lebesgue measure) distribution are consid¬ 
ered. It is shown that rank minimization (which is NP-hard, 
in general) recovers an nxn matrix X of rank at most r with 
probability one if k > (2 n — r)r. It is furthermore shown in 
0 that all matrices X of rank at most n/2 can be recovered, 
again with probability one, provided that k > 4nr — 4r 2 . The 
recovery thresholds in 0, 0 do not exhibit a logn term, 
but assume significant richness in the random measurement 
matrices A, . Storing and applying such measurement matrices 
is costly in terms of memory and computation time. To 
overcome this problem 0 considers rank-one measurement 
matrices of the form A, = a^bj, where a i G R m and b; G R" 
are independent with i.i.d. Gaussian or sub-Gaussian entries, 
and shows that nuclear norm minimization succeeds under the 
same recovery threshold as in 0 , namely k > C(m + n)r. 

Contributions: Inspired by the work of Wu and Verdu on 
analog signal compression m, we formulate an information- 
theoretic framework for almost lossless matrix completion. 
The theory is general in the sense of going beyond the 
low-rank structure and applying to general matrices of “low 
description complexity”. Specifically, we consider random 
matrices X G R mxn of arbitrary distribution (continuous, 
discrete, discrete-continuous mixture, or even singular). With 
S C M rnXTI an e-support set of X, i.e., P[X G 5] > 1 — e, 
and dim P (S ) denoting the lower Minkowski dimension (see 
Definition 0 of S, we show that k > dim P (S) measurements 
suffice to recover X with probability of error no more than e. 
The result holds for Lebesgue a.a. measurement matrices A, 
and does not need any incoherence between the A* and the 
unknown matrix X. What is more, we show that k > dim P (S) 


measurements also suffice for recovery from measurements 
taken with rank-one Aj, again this applies to a.a. rank-one 

Aj. 

Particularizing our results to low-rank matrices X, we show 
that X of rank at most r can be recovered from k > ( m + n— 
r)r measurements taken with either general A, or with rank- 
one Aj. Perhaps surprisingly, it turns out that, depending on 
the specific distribution of the low-rank matrix X, even fewer 
than (m + n — r)r measurements can suffice. We construct a 
class of examples that illuminates this phenomenon. 

Notation: Roman letters A,B,... designate deterministic 
matrices and a, b,... stands for deterministic vectors. Bold¬ 
face letters A, B,... and a, b,... denote random matrices 
and vectors, respectively. For the distribution of a random 
matrix A we write p a and we use p a to designate the 
distribution of a random vector a. A fc denotes the Lebesgue 
measure on WL k The superscript T stands for transposition. For 
A = (ai,...,a„) £ R mxri we let vec(A) = (a]",..., a^) T . For 
a rank-r matrix A £ R m x 71 with ordered singular values 
or (A) > ... > oy(A), we set A(A) = 111=1°’*(A)- For 
a matrix A, tr(A) denotes its trace. For matrices A, B of 
the same dimensions, (A, B) = tr(A T B) is the trace inner 
product between A and B. We write ||A ||2 = \J (A, A) for 
the Euclidean norm of the matrix A. For the Euclidean space 
(R fc , || • || 2 ), we denote the open ball of radius s centered at 
u £ by Bfc(u, s), V(k,s) and A(k — l,s) stand for its 
volume and the area of its closure, respectively. Similarly, for 
the Euclidean space (R mxn , || • || 2 ), we denote the open ball 
of radius s centered at A £ R mxn by B mx „(A, s). We write 
Aif LXn and Af™ xn for the set of matrices A £ R mxra with 
rank(A) < r and rank(A) = r, respectively. 

II. Almost lossless matrix completion 

We start by formulating the almost lossless matrix comple¬ 
tion framework. 

Definition 1. For a random matrix X £ R mxn of arbitrary 
distribution /j,x with Lebesgue decompositon /ix = ^x + 
Rx + /'x (continuous, discrete, and singular components, 
respectively), an (m x n, k ) code consists of 

(i) linear measurements ((Ai, (A*,, -)) T : R mxn —y 

R k ; 

(ii) a measurable decoder g R fc —>• R mxn . 

For given measurement matrices A,, we say that a decoder g 
achieves error probability e if 

P[.g(((A 1 ,X),...,(A fc X)) T ) ^X] < e. 

Definition 2. For e > 0, we call a nonempty bounded set 
S C R mXTl an e-support set of the random matrix X £ R mxn 
if P[X £ 5] > 1 - e. 

Definition 3. (Minkowski dimensiorQ) Let S be a nonempty 
bounded set in M mxra . The lower Minkowski dimension of S 

'This quantity is sometimes also referred to as box-counting dimension, 
which is the origin for the subscript B in the notation diniB(-) used below. 


is defined as 

dim B (<S) = liminf logNs ^ 

m p-to log I 

and the upper Minkowski dimension is 

dim B (<S) = limsup 

p - to log - 

where N$ denotes the covering number of S given by 

N s (p) = min{fe £ N | 5 C (J B mXn (Mi,p), Mi £ R mx "}. 

te{i,..,fe} 

If dim B (5) = dim B (iS) =: dim B (<S), we simply say that 
dim B (d>) is the Minkowski dimension of S. 

III. Main results 

The following result formalizes the statement on the oper¬ 
ationally relevant description complexity being given by the 
lower Minkowski dimensions of e-support sets of X. 

Theorem 1. Let S C R r " x ” be an e-support set of X £ 
l m x". Then, for Lebesgue a.a. measurement matrices Ai, i = 

1 ,...,k, there exists a decoder achieving error probability e, 
provided that k > dim B (5). 

Proof. See Section [VI □ 

Remark 1. The central conceptual element in the proof of 
Theorem Q] is the following probabilistic null space prop¬ 
erty, first reported in old in the context of almost lossless 
analog signal separation. For a.a. measurement matrices Aj, 
i = 1...., /.;, the dimension of the kernel of the mapping X i-a 
((A i, X),..., (Afc, X)) T is mn — k. If the lower Minkowski 
dimension of a set S is smaller than k, the set S will intersect 
the kernel of this mapping at most trivially. What is remarkable 
here is that the notions of Euclidean dimension (for the kernel 
of the mapping) and of lower Minkowski dimension (for S ) 
are compatible. 

We next particularize Theorem |T| for low-rank matrices. To 
this end, we first establish an upper bound on dim B (iS) for 
nonempty and bounded subsets of Ai 'f x n . 

Lemma 1. Let S C Ai " L x 71 be a nonempty bounded set. Then 

dim B (<S) < (m + n — r)r. 

Proof. We can decompose Ai (" x " according to 

r 

Ai? xn = (j A/; mxn . 

i=0 

By llT2l Ex. 5.30], Af™ xn is an embedded submanifold of 
R mx ” of dimension (m + n — i)i, i = 1,..., r. Let X = £ 

{l,...,r} | S n Aff axn 0}. Then, for each i £ I, S fl 
K mxn is a nonempty bounded set and, therefore, dim B (<S fl 
M ] nxn ) is well-defined. By lfl3] Sec. 3.2, Properties (i) and 

(ii) ], dim B (S flTV™ x ") < (m+n—i)i, i £ I. Since the upper 
Minkowski dimension is finitely stable fl3l Sec. 3.2, Property 

(iii) ], we get 

dim B (<S) = max dim B (S fl Af™ Xn ) 

iEX 





< (to + n — r)r 

where in the last step we used the monotonicity of f(s) = 
(m+n — s)s in the range s £ [0, (m + n)/ 2] together with r < 
(to + n)/2 , which in turn follows from r < min(m, n). □ 

We can now put the pieces together to get the desired 
statement on low-rank matrices. 

Remark 2. Lemma [I] together with dim P (■) < dim B (-), when 
used in Theorem Q] implies that for X £ Ml /' x and every 
e > 0, there exists a decoder that achieves error probability 
e for Lebesgue a.a. measurement matrices A,, i = 1 k, 
provided that k > {m + n — r)r. 

While the sufficient condition k > (m + n — r)r in 
Remark [2] is intuitively appealing as (m + n — r)r is the 
dimension of the manifold A'",'" xn , it is actually the lower 
Minkowski dimensions of e-support sets of X £ JVPf Xn that 
are of operational significance. Specifically, depending on the 
distribution of X, a smaller (than (m + n — r)r) number of 
measurements may suffice for recovery of X with probability 
of error at most e. The following example illuminates this 
phenomenon. 

Example 1. Let X = X|X 2 £ M™ xn , where Xi £ R rxm 
and X 2 £ M rxrl are independent. Suppose that Xi has l\ 
columns at positions drawn uniformly at random and con¬ 
taining i.i.d. Gaussian entries with all other columns equal to 
zero and X 2 has / 2 columns at positions drawn uniformly at 
random and containing i.i.d. Gaussian entries with all other 
columns equal to zero. Suppose further that r < l± < m/2 
and r < h < n/2— 1/r. The assumptions h > r, i = 1,2, 
guarantee that P[rank(X) = r] = 1. Next, we construct an 
e-support set T for X with dim B (T) < (h + h)r, which 
by Theorem [I] together with (h + ( 2 )r < (m + n)r/2 — 1 < 
(m + n — r)r — 1 proves that we can recover the rank-?’ matrix 
X with probability of error at most £ from strictly less than 
(m + n — r)r measurements. 

Let A/ Xm C M r r XTn be the set of r x m matrices with no 
more than l nonzero columns. Choose L £ N sufficiently large 
for (i) 5i = A/ Xm IT B rxm ( 0, L) to be an e/2-support set of 
Xi and (ii) <S 2 = A/ Xn nB rX n(0, L) to be an e/2-support set 
of X 2 . By lfl3l Sec. 3.2, Properties (i) and (iii)], we have 

dim B (»Si) = hr (1) 

which is simply the maximum number of nonzero entries of 
Xi £ Si , i = 1,2. Set T = {X]X 2 | X. t £ S h i = 1,2}. Then, 

P[X£71=P[X 1 £(Si,X 2 £(S 2 ] 

= P[Xx £ 5,] P[X 2 £ S 2 ] 

> 1 — £. 

The triangle inequality implies that for all X, , X, £ Si, i = 
1,2, we have 

||X]X 2 -X]X 2 || 2 

< \\XjX 2 - X]X 2 || 2 + \\XjX 2 - X]X 2 || 2 

<L(||X 1 -X 1 || 2 + ||X 2 -X 2 || 2 ) (2) 


where we used <Si C B rxm (0, L) and S 2 C B rxn (0, L). 
Let Nsi(p) be the covering number of Si, i = 1,2. We 
can cover Si by Ng^p) balls of radius p with centers Xj 4 , 
ji = 1 ,...,Ns i (p), i = 1,2. Therefore, (0 implies that 
T can be covered by Ng 1 (p)Ng 2 (p) balls of radius 2 Lp 
centered at XTXj 2 , ji = 1,..., Ng^p), i = 1,2. This yields 
Np(2Lp) < Ng 1 (p)Ng 2 (p) and we finally get 


dinWT) = lim 

p—¥ 0 


< lim 
p—t o 


\ogN T (2Lp) 

lo §2 h 

log (N Sl ( p)Ng 2 (p )) 


lo §2 — P 

log N Sx (p) log Ng 2 (p) 

= lim--1- inn- 

p-r 0 log 77 y-z P^° log ■ 


= hr ■ 


log 

hr 


1 

2 Lp 


where we used o in the last step. 

Remark 3. The derivation of the recovery thresholds in a 
is also based on a null space property similar to the one 
discussed in Remark Q] The relevant dimension in 0 is the 
dimension (to + n — r)r of the manifold J\f™ xn . Example Q] 
above, however, shows that k < (m + n — r)r measurements 
can suffice for recovery of rank-r matrices, thereby corrob¬ 
orating the operational significance of the lower Minkowski 
dimensions of e-support sets of X. 


IV. Rank-one measurement matrices 

Rank-one measurement matrices, i.e., matrices A* = a,bj 
with a i £ R m and b,; £ W 1 , i = 1,..., k, are attractive as they 
require less storage space than general measurement matrices 
and can also be applied faster. Interestingly, Theorem Q] 
continues to hold for rank-one measurement matrices although 
they exhibit much less richness than general measurement 
matrices. The technical challenges in establishing this result 
are quite different from those encountered in the case of 
general measurement matrices. In particular, we will need a 
stronger concentration of measure inequality (cf. Lemma |4j. 

Theorem 2. Let S C E m x 11 be an e-support set of X £ 
R mxn . Then, for Lebesgue a. a. a i £ R m and b? £ R ra and 
corresponding measurement matrices A* = a^bj, i = 1 ,...,k, 
there exists a decoder achieving error probability e, provided 
that k > dim B (iS). 

Proof See Section [VI □ 

Remark 4. Example Q] can be shown to carry over to rank-one 
measurement matrices A,. 

Remark 5. Theorem[2] when used in combination with Lemma 
IH implies that for X £ M.f ixn and every e > 0, there 
exists a decoder achieving error probability e for Lebesgue 
a.a. a,; £ R m and b^ £ R n , provided that k > (to + n — r)r. 
In contrast, the threshold k > C(m + n)r in (H for rank-one 
measurements requires a, and b, to be independent random 
vectors containing i.i.d. Gaussian or sub-Gaussian entries. In 
addition, the constant C in JH remains unspecified. 






V. Proofs of Theorems [Hand [2] 

For both proofs, we first construct a measurable map g : 
—» R mxrl such that 

P[[g(«A 1 ,X},...,<A fc ,X» T ) ^X] 

< P[3Z g S x \{0}|«Ai, Z).<A fc) Z)) T = 0,X g 5] + e 

(3) 

with Sx = {W — X | W € 5} for XgS. The proofs are then 
concluded by showing that 

P[3Z g 5x\{0}|«A 1 , Z),..., (A*,, Z)) T = 0,Xe5]= «I4) 

for Lebesgue a.a. matrices A, g R mxn , i = 1 in the 
case of Theorem Q] and for Lebesgue a.a. vectors a, £ R r " 
and bi £ R n with A* = a^bj, i = 1 in the case of 
Theorem [2] 

Proof of 0: Let S C R mXTl be an e-support set of X with 
dini pbS) < k. We define a measurable map g as follows: 

g( y) = 

Z, if {W g S | ((A,, W},.., <A fc) W)) T = y} = {Z} 

E, else 

where E is an arbitrary, but fixed, matrix in R mxra \ S (used 
to declare a decoding error). Then, we have 

P[. 9 (((A 1 ,X),..,(A fc ,X)) T ) ^X] 
<P[ 9 (({A ll X),.,(A t ,X)) T )/X,Xe5] +P[X^5] 
<P[ 5 (((A 1 ,X),...,(A fc ,X)) T ) /X,Xe5] + e (5) 

= P[ S (((A ll X),..,(A t ,X)) T )=E,Xe5] +e (6) 

= P[3Z g <S x \{0}|«A 1 ,Z),...,(A fe ,Z)) T = 0,X g 5] +e 

where (0 is a consequence of S being an e-support set and 
in 0 we used that the decoder declares an error if and only 
if |{W g S ((A 1; W),..., (A fe , W)) T = y}| > 1 for y = 
((A 1 ,X),...,(A fe ,X)) T with X g <S. 

Finishing the proof of Theorem 1 : Let s > 0 and suppose 
that Ai,...,Afc, i = 1 are independent and uniformly 
distributed on B rnxn (f) 1 s). Then, we have 

I P[3Z g (S x \{0} | ((Ar, Z)(At, Z)) T =0,Xg5] 

(t3 mX n(0, S )) fc 

d/i Al x ... x d/i Afc 

= j P[3Z g <S X \{0} | ((A 1; Z)(At, Z)) T = 0] dpxd) 
s 

= 0 ( 8 ) 

where 0 is a consequence of Fubini’s theorem for non¬ 
negative measurable functions and 0 follows from Lemma 
[2] below. With R mxn = (J sgN B mxn (0, s) and since s is 
arbitrary, © holds for Lebesgue a.a. measurement matrices 
A. t , which concludes the proof of Theorem [T| 

Finishing the proof of Theorem 2: Let s > 0 and suppose 
that A = [ai,...,a fc ] g W mxk and B = [bi,...,b fc ] g R nxk 
are independent random matrices with columns a * indepen¬ 
dent and uniformly distributed on *B m (0,s) and columns b, 


independent and uniformly distributed on £>„((). s'). Then, we 
have 

J P[3Zg<S x \{0} | (aJZb 1 ,...,aJZb fc ) T = 0,Xg5] 

(B m (0,s)x8 n (0,s)) 1 

d p ai x d/i bl x... x d p ak x dp bk 

= J P[3Z g S x \{0} | (a^Zb,,..., aJZbfc) 7 = 0] dpx (9) 
s 

= 0 ( 10 ) 

where © is a consequence of Fubini’s theorem for nonneg¬ 
ative measurable functions and ([Tol l follows from Lemma [3 
below. Again, with R ( = (J sgN ^(0 j s ) an d since s is arbitrary, 
© holds for Lebesgue a.a. vectors a, g KL m and b* g R n , 
thereby finishing the proof of Theorem [2] □ 

Lemma 2. Let s > 0 and Ai,..., A^, i = 1,..., k, be indepen¬ 
dent and uniformly distributed on B mxn (0 1 s). Suppose that 
IA C R mxn i s a nonempty bounded set with dim B (£f) < k. 
Then, we have 

P[3XgW\{0}|((A ll X},..,(A t ,X}) T = 0]=0. 

Proof Follows from rewriting the trace inner products 
(A^ X), i = 1,..., k as inner products between vectors in R mn 
and subsequent application of 1TT[ Prop. 1], □ 

Lemma 3. Let s > 0 and take A = [ai,...,afc] g R mxfc and 
B = [bi,...,b fc ] g jgnxfc i nc i e p enc i en i random matrices 

with columns a,;, i = 1 ,.„, k, independent and uniformly 
distributed on B m (0 , s) and columns bj, i = 1 ,...,k, inde¬ 
pendent and uniformly distributed on B n (0,s). Suppose that 
IA C R mxn is a nonempty bounded set with dim B (£f) < k. 
Then, we have 

P ■= P[3X g W\{0}|(a[Xbi,..., a[Xb fc ) T = 0] = 0. 

Proof Let R = max X6 ^ rank(X) and set 

U L . r = {xgW| A(X) > i, ai (X) < L, rank(X) = r} 

for LgN and r = 1,..., R. By the union bound, we have 

it 

P<J2J2 P Pr (ID 

LeNr =1 

where 

Pl.t = P [3X g U L , r | (a^Xbr,..., aJXb fe ) T = 0]. 

We now prove by contradiction that P[ l T = 0 for all L g N 
and all r g (1,..., II }. Suppose that there exists an L g N and 
an r g {1,..., R} such that Pr ?r > !> 0 (by definition, Pl.t >0). 
For this pair {L,r}, we would then have 

liminf logP, f r = o. (12) 

p-xo log i 

For p > 0, let Nu L r (p) be the covering number of the set IAl,t 
and denote corresponding covering balls centered at Mj(p) g 
R mx " as Bm Xn (Mi(p),p), i = 1 ,...,N ULtr (p). We now fix 



N Ul r ( p ) matrices 

Xj(p) G B m xn(Mj(p),p) n U L , r , i = l,-") Nu L r (p)(l3) 
Since 

Bmxn(Mi(p) : p) C # mXn (Xj(p),2p), i = 1,..., Nu L:r {p) 
by the triangle inequality, we get 

N u l ,Ap) 

P L ,r < Y P[ 3X e BmXn(Mi(p),p)| 

2=1 

(a|Xb 1; ..., a^Xbfc) 7 = 0] 

< Y P[3X G H mx „(Xj(p), 2p) | 


2=1 


(a|Xb l7 ..., aJXbfe) T = 0] 


n u l Ap) 


< Y P[3X G B mX ra(Xj(p), 2p) | 


2=1 


< 


^||a,||2|| x -X i (p)||l||b,||| 
\ i=i 


k(m+n) , , 

< 2^-^ r - 


With the upper bound on P^ r in (IT6l) we now get 

lim inf —- 
p~>° log ^ 


< lim inf 

p—> 0 


^og(Nu L , r (p))+klogp + klogg(L,r,k,s,p) 

lQ gi 

. , log((V M (p)) , . k\ogg(L,r,k,s,p ) 

p~>° log j p-*-o log j 

. f log(A w (p)) 

= lim mi-i- k 

p^° log _ 

p^O log i 
= dim B (W) — k 
< 0 


(17) 


||(a]‘Xb 1 ,...,aTXb fc ) T || 2 <p], p>Q14) 

Now, for a* G Z? m (0, s), bj G H n (0,s), and X G 
Bmxn(X-i{p),2p )- we have 

||(a7X i (p)bi,...,a[X l (p)b fe ) T || 2 
< ll( a i(X - X i (p))b 1 ,..., aJ(X - X,(p))b fc ) T || 2 
+ || (a^XbiaJXbfe) T || 2 


+ IKa^Xbi,..., aJXbfe) T || 2 

<2s 2 Vkp+\\{alXb 1 ,..., a T k Xb k ) T \\ 2 , p> 0. (15) 

Inserting (fl5l > into (1 1 41 allows us to further upper-bound P/ ;T . 
according to 

N "L, r (P) 

Pl,t< Y P [II ( a T X i(p) b lr"> a fcXj(p)bfc) T || 2 


< p(l + 2 s 2 Vk)\ 

r Nu L ,r(p)p k g{L,r,k,s,p) k , p> 0 

(16) 

where 

g(L,r,k,s,p) = 

L{l+2s 2 Vk) 

Jr + Jrlogmax l), if r = 1 

V(r,l)(p(l+2s 2 Vk)) r - 1 A(r-1,1)L’- 1 j f ± 

s 77 s 2 (r— 1) ’ 

Here, we applied the concentration of measure inequality in 
Lemma [4] below with 5 = p( 1 + 2 s 2 \/k) and used the fact 
that l/A(Xj(p)) < L, CTi(Xi(p)) < L, and rank(Xj(p)) = r 
(recall that by (fl3l > all matrices Xj(p) are in the set Ul, t )- 


(18) 

where © follows from Z7/. r C U and in the last step we used 
that dim P (7f) < k, by assumption. Since (flSl contradicts (fbH) . 
Pl,t = 0 for all L G N and all r G By (fill , this 

establishes that P = 0. □ 

Lemma 4. Let A = [aia^] and B = [bibfe] be 
independent random matrices, with columns aj, i = 1,..., k, in¬ 
dependent and uniformly distributed on B m (0, s ) and columns 
bj, i = 1 ,...,k, independent and uniformly distributed on 
H„(0,s). Suppose that X G R mxrl with r = rank(X) > 0. 
Then, we have 

P[||(a]'Xb 1> ...,aJXb*) T || a < 6] < f(X, s, S) k 

with /(X, s, 5) defined in (1211 1. 

Proof We have 

P[||( a i'Xb 1 ,...,aJXbfc) T || 2 < 5] 

- k 

= P ^^(a^Xbj) 2 < 6 
. 2=1 

< P[|a7Xbj| < 6, for all i = 1 ,..., k] 

= P[|a T Xb| < <f] fc (19) 

< S k 2 M ^~ kr f(X, s, 6) k (20) 

where in © a and b are independent with a uniformly dis¬ 
tributed on B m { 0, s) and b uniformly distributed on B n ( 0, s) 
and, therefore, we can apply Lemma 0 below to obtain 

©. □ 

Lemma 5. Let a and b be independent random vectors, with a 
uniformly distributed on B m ( 0, s) and b uniformly distributed 
on S n (0, s). Suppose that X G M mxn with r = rank(X) > 0. 
Then, we have 


wherJ^ 


D r 


]a T Xb| <6]<6D r ^ n f(X,s,5) 


2V(n — r, 1)V(m — r, l)V(r — 1,1) 


V(m, 1 )V(n, 1) 


-We use the convention that V(0, s ) = 1. 


















and 


< T 


m,s,s) 

i 

“ A(X) 


2 

s 2 

S r ~ 


+ £logmax(^£l,l) 

W(r,l) , A(r- l,l)ai(X) r - 1 
~s^ 1 s 2 (r —1) 


ifr = 1 
ifr > 1. 

( 21 ) 


Proof. Using Fubini’s theorem for nonnegative measurable 
functions and noting that 1 /V(m,s) and 1/V(n,s) is a 
probability density function for a and b, respectively, we can 
rewrite 


|a T Xb| < «5] = 


V(m,s)V(n,s) 


h( a)dA m (a) 


with 


M a ) — / X{bGR":|a T Xb|<5}(b)dA"(b). 

Let X = UEV be a singular value decomposition of X, where 
U G M mxm and V G R" xn are orthogonal and 


E = 



g M 


mxn 


with D = diag(<7i(X)... oy(X)). Using the fact that Lebesgue 
measure on B m { 0, s) and B n { 0, s) is invariant under rotations, 
we can rewrite 


P[|a T Xb| < 8] 


1 

V (to, s)U (n, s) 


>B m ( 0,s) 


/i(Ua)dA m (a) 


( 22 ) 


and 

/l(Ua) = / X{bGR n :|a T Eb|<i}(b)dA n (b). 

JB n ( 0,s) 

Decomposing a = (a]" aJ) T and b = (b| bJ) T , with ai,bi G 
M r , we can upper-bound ft,(Ua) by 


h( Ua) < V(n 


r,s) \ 

JB r (0,s) 


X{biGM r :|a[Dbi|<5}(bl)dA r (bi) 


V{n — r, s) 
A(X) 


/8 r (0,s<ri(X)) 


X{ceM r :|a]'c|<<5}( C )dA , ’(c) 


where in the last step we changed variables to c = Dbi and 
used that ||c|| 2 < cxi(X)||biH 2 - Using that Lebesgue measure 
on B r ( 0, sfJi(X)) is invariant under rotations and setting ei = 
(1 0 ... 0) T G R r , we can further upper-bound /i(Ua) by 


h( Ua) 

V{n — r, s) 
A(X) 


/B r (0,a<ri(X)) 


x t—i.w< n fc} (c)dA ’' (c) 


< 2V(n — r, s)V(r — 1, s ) 


cr 1 




A(X) 


(sCTi(X), 


llailh 


( 23 ) 


Plugging (1231) into (122b . we find that 
P[|a T Xb| < 6} 

o-i (X)^- 1 ) 


< D 


, r+1 / min (sen(X),——)dA r (ai). 
A(X)s r+1 7 Br(0 , s ) V ||ai|| 2 / 


(24) 

It remains to upper-bound the integral in (l24l >. We can split 

_ 5 _ 

l a i|| 2 - 


[ min (scti(X), -—)dA r (ai) = /1 +/ 2 (25) 

JB r { 0,s) V II a lII2 ' 


where 


h = sa, (X) [ 
Jb, 


< 


B r ( 0 , S )nB r (o,-_^) 


dA r (ai) 


8 r V(r, 1) 
(scri(X)) r - 1 


and 


h = 8 


dA r (ai) 


ll a i|| 2 


For r = 1 we get 


h = 




If r > 1 we can change variables to polar coordinates and find 
that 

h=8A{r- 1,1) f ^ y r ~*d\ 1 (y) 

< 8A[r — 1,1) [ y r ~ 2 d\ 1 (ij) 

Jo 

A(r — 1, l)s( r_1 ) 


= <5 


r — 1 


/l + /2 < i X 


^AA1,1) if r = 1 


(26) 


Therefore, we have 

2 + 2 log max ( - 

iL + AL-t.iK- if 
!t7i(X)) r 1 r—1 

Combining d24l) . (l25l >. and ( 126b we finally end up with 

P[\n T Xb\ < 8} < SD r ^ n f{X,s,S). 

The upper bound on follows from 2 fe / 2 < V(k, 1) < 

2 k for k G N. □ 


References 

[1] E. J. Candes and Y. Plan, “Tight oracle inequalities for low-rank matrix 
recovery from a minimal number of noisy random measurements,” IEEE 
Trans. Inf. Theory, vol. 4, no. 57, pp. 2342-2359, Apr. 2011. 

[2] D. Gross, “Recovering low-rank matrices from few coefficients in any 
basis,” IEEE Trans. Inf. Theory, vol. 57, no. 3, pp. 1548-1566, Mar. 
2011 . 

[3] E. J. Candes and B. Recht, “Exact matrix completion via convex 
optimization,” Found. Comput. Math., vol. 9, no. 6, pp. 717-772, 2009. 

[4] E. J. Candes and T. Tao, “The power of convex relaxation: Near-optimal 
matrix completion,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2053- 
2080, May 2010. 

[5] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank 
solutions of linear matrix equations via nuclear norm minimization,” 
SIAM Review, vol. 52, no. 3, pp. 471-501, Dec. 2010. 



























[6] B. Recht, “A simpler approach to matrix completion,” 1 Mach. Learn. 
Res., vol. 12, pp. 341-3430, 2011. 

[7] T. T. Cai and A. Zhang, “Sparse representation of a polytope and 
recovery of sparse signals and low-rank matrices,” IEEE Trans. Inf. 
Theory, vol. 60, no. 1, pp. 122-132, Jan. 2014. 

[8] -, “ROP: Matrix recovery via rank-one projections,” Ann. Stat., 

vol. 43, no. 1, pp. 102-138, 2015. 

[9] Y. C. Eldar, D. Needed, and Y. Plan, “Uniqueness conditions for low- 
rank matrix recovery,” Appl. Comp. Harm. Anal., vol. 33, no. 2, pp. 
309-314, 2012. 

[10] Y. Wu and S. Verdu, “Renyi information dimension: Fundamental limits 
of almost lossless analog compression,” IEEE Trans. Inf. Theory, vol. 56, 
no. 8, pp. 3721-3748, Aug. 2010. 

[11] D. Stotz, E. Riegler, and H. Bolcskei, “Almost lossless analog signal 
separation,” in Proc. IEEE Int. Symp. Inf. Theory, Istanbul, Turkey, Jul. 
2013, pp. 106-110. 

[12] J. M. Lee, Introduction to Smooth Manifolds, 1st ed. New York, NY: 
Springer, 2000. 

[13] K. Falconer, Fractal Geometry, 1st ed. New York, NY: Wiley, 1990. 



